39 research outputs found

    Improving binary classification using filtering based on k-NN proximity graphs

    Get PDF
    © 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach

    Modelling customers credit card behaviour using bidirectional LSTM neural networks

    Get PDF
    With the rapid growth of consumer credit and the huge amount of financial data developing effective credit scoring models is very crucial. Researchers have developed complex credit scoring models using statistical and artificial intelligence (AI) techniques to help banks and financial institutions to support their financial decisions. Neural networks are considered as a mostly wide used technique in finance and business applications. Thus, the main aim of this paper is to help bank management in scoring credit card clients using machine learning by modelling and predicting the consumer behaviour with respect to two aspects: the probability of single and consecutive missed payments for credit card customers. The proposed model is based on the bidirectional Long-Short Term Memory (LSTM) model to give the probability of a missed payment during the next month for each customer. The model was trained on a real credit card dataset and the customer behavioural scores are analysed using classical measures such as accuracy, Area Under the Curve, Brier score, Kolmogorov–Smirnov test, and H-measure. Calibration analysis of the LSTM model scores showed that they can be considered as probabilities of missed payments. The LSTM model was compared to four traditional machine learning algorithms: support vector machine, random forest, multi-layer perceptron neural network, and logistic regression. Experimental results show that, compared with traditional methods, the consumer credit scoring method based on the LSTM neural network has significantly improved consumer credit scoring

    A deep learning model for behavioural credit scoring in banks

    Get PDF
    The main aim of this paper is to help bank management in scoring credit card clients using machine learning by modelling and predicting the consumer behaviour concerning three aspects: the probability of single and consecutive missed payments for credit card customers, the purchasing behaviour of customers, and grouping customers based on a mathematical expectation of loss. Two models are developed: the first provides the probability of a missed payment during the next month for each customer, which is described as Missed payment prediction Long Short Term Memory model (MP-LSTM), whilst the second estimates the total monthly amount of purchases, which is defined as Purchase Estimation Prediction Long Short Term Memory model (PE-LSTM). Based on both models, a customer behavioural grouping is provided, which can be helpful for the bank’s decision-making. Both models are trained on real credit card transactional datasets. Customer behavioural scores are analysed using classical performance evaluation measures. Calibration analysis of MP-LSTM scores showed that they could be considered as probabilities of missed payments. Obtained purchase estimations were analysed using mean square error and absolute error. The MP-LSTM model was compared to four traditional well-known machine learning algorithms. Experimental results show that, compared with conventional methods based on feature extraction, the consumer credit scoring method based on the MP-LSTM neural network has significantly improved consumer credit scoring

    Adaptive computation of multiscale entropy and its application in EEG signals for monitoring depth of anesthesia during surgery

    Get PDF
    Entropy as an estimate of complexity of the electroencephalogram is an effective parameter for monitoring the depth of anesthesia (DOA) during surgery. Multiscale entropy (MSE) is useful to evaluate the complexity of signals over different time scales. However, the limitation of the length of processed signal is a problem due to observing the variation of sample entropy (SE) on different scales. In this study, the adaptive resampling procedure is employed to replace the process of coarse-graining in MSE. According to the analysis of various signals and practical EEG signals, it is feasible to calculate the SE from the adaptive resampled signals, and it has the highly similar results with the original MSE at small scales. The distribution of the MSE of EEG during the whole surgery based on adaptive resampling process is able to show the detailed variation of SE in small scales and complexity of EEG, which could help anesthesiologists evaluate the status of patients.The Center for Dynamical Biomarkers and Translational Medicine, National Central University, Taiwan which is sponsored by National Science Council (Grant Number: NSC 100-2911-I-008-001). Also, it was supported by Chung-Shan Institute of Science & Technology in Taiwan (Grant Numbers: CSIST-095-V101 and CSIST-095-V102). Furthermore, it was supported by the National Science Foundation of China (No.50935005)

    Genetic folding for solving multiclass SVM problems

    Get PDF
    Genetic Folding (GF) algorithm is a new class of evolutionary algorithms specialized for complicated computer problems. GF algorithm uses a linear sequence of numbers of genes structurally organized in integer numbers, separated with dots. The encoded chromosomes in the population are evaluated using a fitness function. The fittest chromosome survives and is subjected to modification by genetic operators. The creation of these encoded chromosomes, with the fitness functions and the genetic operators, allows the algorithm to perform with high efficiency in the genetic folding life cycle. Multi-classification problems have been chosen to illustrate the power and versatility of GF. In classification problems, the kernel function is important to construct binary and multi classifier for support vector machines. Different types of standard kernel functions have been compared with our proposed algorithm. Promising results have been shown in comparison to other published works

    Nonlinear and conventional biosignal analyses applied to tilt table test for evaluating autonomic nervous system and autoregulation

    Get PDF
    Copyright © Tseng et al.; Licensee Bentham Open. This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.Tilt table test (TTT) is a standard examination for patients with suspected autonomic nervous system (ANS) dysfunction or uncertain causes of syncope. Currently, the analytical method based on blood pressure (BP) or heart rate (HR) changes during the TTT is linear but normal physiological modulations of BP and HR are thought to be predominately nonlinear. Therefore, this study consists of two parts: the first part is analyzing the HR during TTT which is compared to three methods to distinguish normal controls and subjects with ANS dysfunction. The first method is power spectrum density (PSD), while the second method is detrended fluctuation analysis (DFA), and the third method is multiscale entropy (MSE) to calculate the complexity of system. The second part of the study is to analyze BP and cerebral blood flow velocity (CBFV) changes during TTT. Two measures were used to compare the results, namely correlation coefficient analysis (nMxa) and MSE. The first part of this study has concluded that the ratio of the low frequency power to total power of PSD, and MSE methods are better than DFA to distinguish the difference between normal controls and patients groups. While in the second part, the nMxa of the three stages moving average window is better than the nMxa with all three stages together. Furthermore the analysis of BP data using MSE is better than CBFV data.The Stroke Center and Department of Neurology, National Taiwan University, National Science Council in Taiwan, and the Center for Dynamical Biomarkers and Translational Medicine, National Central University, which is sponsored by National Science Council and Min-Sheng General Hospital Taoyuan

    An assessment of pulse transit time for detecting heavy blood loss during surgical operation

    Get PDF
    Copyright @ Wang et al.; Licensee Bentham Open. This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.The main contribution of this paper is the use of non-invasive measurements such as electrocardiogram (ECG) and photoplethysmographic (PPG) pulse oximetry waveforms to develop a new physiological signal analysis technique for detecting blood loss during surgical operation. Urological surgery cases were considered as the control group due to its generality, and cardiac surgery as experimental group since it involves blood loss and water supply. Results show that the control group has the tendency of a reduction of the pulse transient time (PTT), and this indicates an increment in the blood flow velocity changes from slow to fast. While for the experimental group, the PTT indicates high values during blood loss, and low values during water supply. Statistical analysis shows considerable differences (i.e., P <0.05) between both groups leading to the conclusion that PTT could be a good indicator for monitoring patients' blood loss during a surgical operation.The National Science Council (NSC) of Taiwan and the Centre for Dynamical Biomarkers and Translational Medicine, National Central University, Taiwan

    Hip fracture risk assessment: Artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study

    Get PDF
    Copyright @ 2013 Tseng et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background - Osteoporotic hip fractures with a significant morbidity and excess mortality among the elderly have imposed huge health and economic burdens on societies worldwide. In this age- and sex-matched case control study, we examined the risk factors of hip fractures and assessed the fracture risk by conditional logistic regression (CLR) and ensemble artificial neural network (ANN). The performances of these two classifiers were compared. Methods - The study population consisted of 217 pairs (149 women and 68 men) of fractures and controls with an age older than 60 years. All the participants were interviewed with the same standardized questionnaire including questions on 66 risk factors in 12 categories. Univariate CLR analysis was initially conducted to examine the unadjusted odds ratio of all potential risk factors. The significant risk factors were then tested by multivariate analyses. For fracture risk assessment, the participants were randomly divided into modeling and testing datasets for 10-fold cross validation analyses. The predicting models built by CLR and ANN in modeling datasets were applied to testing datasets for generalization study. The performances, including discrimination and calibration, were compared with non-parametric Wilcoxon tests. Results - In univariate CLR analyses, 16 variables achieved significant level, and six of them remained significant in multivariate analyses, including low T score, low BMI, low MMSE score, milk intake, walking difficulty, and significant fall at home. For discrimination, ANN outperformed CLR in both 16- and 6-variable analyses in modeling and testing datasets (p?<?0.005). For calibration, ANN outperformed CLR only in 16-variable analyses in modeling and testing datasets (p?=?0.013 and 0.047, respectively). Conclusions - The risk factors of hip fracture are more personal than environmental. With adequate model construction, ANN may outperform CLR in both discrimination and calibration. ANN seems to have not been developed to its full potential and efforts should be made to improve its performance.National Health Research Institutes in Taiwa

    A Comparison of Different Algorithms for EEG Signal Analysis for the Purpose of Monitoring Depth of Anesthesia

    Get PDF
    All rights reserved. Electroencephalography (EEG) signals have been commonly used for assessing the level of anesthesia during surgery. However, the collected EEG signals are usually corrupted with artifacts which can seriously reduce the accuracy of the depth of anesthesia (DOA) monitors. In this paper, the main purpose is to compare five different EEG based anesthesia indices, namely median frequency (MF), 95% spectral edge frequency (SEF), approximate entropy (ApEn), sample entropy (SampEn) and permutation entropy (PeEn), for their artifacts rejection ability in order to measure the DOA accurately. The current analysis is based on synthesized EEG corrupted with four different types of artificial artifacts and real data collected from patients undergoing general anesthesia during surgery. The experimental results demonstrate that all indices could discriminate awake from anesthesia state (p < 0.05), however PeEn is superior to other indices. Furthermore, a combined index is obtained by applying these five indices as inputs to train, validate and test a feed-forward back-propagation artificial neural network (ANN) model with bispectral index (BIS) as target. The combined index via ANN offers more advantages with higher correlation of 0.80 ± 0.01 for real time DOA monitoring in comparison with single indices.Center for Dynamical Biomarkers and Translational Medicine, National Central University, Taiwan which is sponsored by Ministry of Science and Technology (Grant Number: MOST103-2911-I-008-001). National Natural Science Foundation of China (Grant Number: 51475342)

    A Comparison of Different Algorithms for EEG Signal Analysis for the Purpose of Monitoring Depth of Anesthesia

    Get PDF
    All rights reserved. Electroencephalography (EEG) signals have been commonly used for assessing the level of anesthesia during surgery. However, the collected EEG signals are usually corrupted with artifacts which can seriously reduce the accuracy of the depth of anesthesia (DOA) monitors. In this paper, the main purpose is to compare five different EEG based anesthesia indices, namely median frequency (MF), 95% spectral edge frequency (SEF), approximate entropy (ApEn), sample entropy (SampEn) and permutation entropy (PeEn), for their artifacts rejection ability in order to measure the DOA accurately. The current analysis is based on synthesized EEG corrupted with four different types of artificial artifacts and real data collected from patients undergoing general anesthesia during surgery. The experimental results demonstrate that all indices could discriminate awake from anesthesia state (p < 0.05), however PeEn is superior to other indices. Furthermore, a combined index is obtained by applying these five indices as inputs to train, validate and test a feed-forward back-propagation artificial neural network (ANN) model with bispectral index (BIS) as target. The combined index via ANN offers more advantages with higher correlation of 0.80 ± 0.01 for real time DOA monitoring in comparison with single indices.Center for Dynamical Biomarkers and Translational Medicine, National Central University, Taiwan which is sponsored by Ministry of Science and Technology (Grant Number: MOST103-2911-I-008-001). National Natural Science Foundation of China (Grant Number: 51475342)
    corecore